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Summary of Review 



This study examines the relationship between high-stakes school accountability and its ef- 
fects upon student test scores and school policies. The authors seek to understand the extent 
to which accountability sanctions and incentives for the poorest-performing schools in 
Florida explain subsequent changes in school practices and policies as well as achievement 
— measured by state assessment data, Stanford- 10 assessment data and surveys of public 
school principals. Based on statistical analysis of the lowest-performing schools, the au- 
thors report that accountability incentives and sanctions are related to school practice and 
policy as well as to student achievement. The report uses comprehensive data sources and 
applies appropriate methodologies to address the research question. Its analyses demon- 
strate a mediating relationship for school policies between accountability and achievement 
gains, a finding consistent with both the literature on the subject and common sense. How- 
ever, the report overstates and makes causal claims about the relationship between account- 
ability sanctions and improvements in school achievement. In this way, the report’s title 
and some causal statements in the body of the report are unfortunate in that they overstate 
the report’s sound findings and suggest that vouchers and other accountability measures are 
shown to be the cause of achievement gains in some of Florida’s lowest-performing 
schools. 
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Review 



I. Introduction 

Do high-stakes accountability systems yield 
higher student achievement, particularly for 
schools facing sanctions? A de facto answer 
of yes was embedded in No Child Left Be- 
hind (NCLB), which mandated adoption of 
high-stakes assessments as a means for driv- 
ing education reform. Not until recently, 
however, have researchers had the rich sys- 
tem-level data with which to examine the 
impact of large-scale accountability reforms 
on the achievement of students at the state 
level. 

If higher achievement follows establishment 
and implementation of sanction- and incen- 
tive-laden accountability systems, what is 
the mechanism behind the increase? Do 
schools facing sanctions alter and modify 
their practices and policies to avoid harsher 
consequences associated with continued low 
performance? Do these modified practices 
and policies lead to greater efficiency that 
manifest in the form of higher student 
achievement? The premise is plausible. 

Numerous researchers have documented that 
school practices are often altered in perverse 
ways (e.g., teaching to the test, narrowed 
curriculum, and cheating) in response to 
high-stakes accountability. 1 Clearly, the 
responses to the increased stimulus of ac- 
countability pressure do not necessarily con- 
form to best practice. Are such undesirable 
consequences a fait accompli ? 

The Urban Institute Working Paper re- 
viewed here is titled “Feeling the Florida 
Heat? How Low-Performing Schools Re- 
spond to Voucher and Accountability Pres- 
sure,” and authored by Cecilia Elena Rouse, 
Jane Hannaway, Dan Goldhaber, and David 



2 

Figlio.“ It presents an analysis of the Florida 
A+ accountability system. The authors chal- 
lenge the notion that all schools facing ac- 
countability sanctions will try to “game the 
system” by making superficial changes that 
result in higher test scores but not greater 
learning. They explore principals’ self- 
reported reactions to very low state ratings 
and the threat of sanctions, attempting to 
unpack the “black-box” mediating between 
the accountability system and student 
achievement. They find that schools receiv- 
ing a grade of “F” in the Florida A+ Plan for 
Education and confronted with the incen- 
tives and sanctions associated with such a 
grade responded by altering practice and that 
these changes explained, at least in part, 
subsequent gains in student achievement. 

II. Findings and Conclusions 
of the Report 

The report seeks to explain the impact on 
student achievement and school policy of 
the receipt of grade of F under Florida’s A+ 
Plan for Education. In particular, the authors 
examine 35 elementary schools that received 
an F. Under the plan, these schools faced a 
combination of incentives and sanctions, 
including outside evaluation by a commu- 
nity assessment team, technical assistance 
from the Florida Department of Education, 
and supplementary assistance through Flor- 
ida’s Assistance Plus program, as well as the 
possibility for student transfer using vouch- 
ers, called “Opportunity Scholarships.” The 
authors take special care to focus on these 
schools and compare their subsequent per- 
formance to other low-performing schools 
— those that received a D and therefore 
faced a slightly different combination of 
incentives and sanctions. 
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The authors contend that incentives and 
sanctions might improve school perform- 
ance by changing the policies of key school 
administrators. The theory of action here is 
that intensive scrutiny and supervision under 
the Florida A+ Plan, personal drive to reme- 
diate low performance, and avoidance of the 
stigma of administering to an “F” school 
will drive school-level changes in policy and 
practice, which then leads to higher student 
achievement. The sanctions and incentives 
of being labeled an “F” school should alter 
practice and policy and do so in ways that 
truly benefit the students and do not simply 
try to game the system. 

The authors pursue two analytic paths to 
explore these proposed relationships: (1) 
statistical analyses of the impact of an F on 
student test scores, and (2) statistical analy- 
ses of the impact of an F on school policies. 
The combined analyses seek to unpack how 
policy mandates are mediated by school 
policy changes to affect increases in student 
achievement. 

Data for the analyses come from two 
sources: (1) administrative data including 
individual student data with characteristics 
such as race, sex, ELL and disability status, 
as well as both high-stakes FCAT test scores 
and low-stakes Stanford- 10 test scores; and 
(2) survey data of principals of all public 
schools in Florida, conducted in the 2001-02 
and 2003-04 school years. 3 The data sets 
used for the analyses are unique and impres- 
sive given their scope and breadth. More 
importantly, the data sets are adequate to 
address the research questions posed in the 
report. 

As described above, the two types of analy- 
ses parallel these two sets of data and inves- 
tigate the relationship between accountabil- 
ity sanctions and subsequent changes in stu- 
dent achievement. In examining the impact 



of an F on student test scores, the authors 
use regression techniques on cross-sectional 
and multi-cohort data, controlling for a vari- 
ety of student and school characteristics. 
According to the researchers, the analyses 
seek to establish the “plausibly causal” (p. 
14) impact of the receipt of a grade of F on 
student test score gains. Technically, and as 
these authors recognize, it is not the F grade 
that causes student test score gains but the 
practices motivated by such a grade. 

The report attempts to control for the possi- 
bility that the changes reflected in assess- 
ment score outcomes do not reflect an im- 
provement in students’ broader knowledge 
base but rather their ability to do well on a 
specific assessment (i.e., that students were 
taught to the test). For this purpose, the au- 
thors investigate results for both the high- 
stakes FCAT test as well as the low-stakes 
Stanford- 10 assessment. This was a sensible 
approach, but given the highly standardized 
nature of the Stanford- 10 assessment, the 
authors should probably not rule out teach- 
ing to the test as a contributing factor solely 
on the basis of consistent gains across both 
the FCAT and Stanford- 10. Specifically, any 
skills acquired through teaching to the 
FCAT such as improved test-taking strate- 
gies would likely transfer well to the Stan- 
ford-10. Subsequent analyses using the prin- 
cipal survey data indicate that achievement 
gains are only modestly attributable to nar- 
rowing the curriculum to focus on test per- 
formance. 

The report finds that across FCAT and Stan- 
ford-10 reading and mathematics exams, 
students in F-graded schools demonstrated 
larger test score increases than did students 
in other schools. Since students in “F” 
schools have very low achievement, this 
finding is not surprising and in fact is con- 
sistent with regression to the mean. 4 It is 
unfortunate that the authors report this in- 
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formation separately from later analyses 
where they show that the increases observed 
are statistically significant even after ac- 
counting for these regression effects. 

Because of the large number of schools in 
the dataset, the authors examine first-time 
“F” schools separately from repeat “F” 
schools, and they find consistent and signifi- 
cant differences in test score increases be- 
tween the two types of schools. In particular, 
the authors report that as schools remain in 
“F” status, accountability pressure increases 
and so do test scores. Given the small sam- 
ple size associated with this comparison, it is 
impossible to completely rule out the possi- 
bility that these increases were the result of 
unintended policies such as extensive test 
prep. 

To investigate the impact of “F”-school 
status on school policies, the authors organ- 
ized schools’ policies into various domains, 
reflecting types of changes that schools 
would likely make. The researchers found 
that schools facing the increased pressures 
associated with F schools tended to adopt 
block scheduling, increase time for collabo- 
rative planning and class prep for teachers, 
and otherwise reorganize the school sched- 
uling structure. In addition, the authors 
found that “F” schools tended to focus on 
low-performing students, increase the time 
spent on instruction, and increase resources 
available to teachers. The authors also found 
some evidence of narrowing of the curricu- 
lum, with an increasing focus on the tested 
subjects. 

Overall, the report’s findings are not surpris- 
ing and reflect a classic mediating relation- 
ship. Increased pressure applied to low- 
performing schools within the Florida A+ 
Plan led to changes in practice and policy at 
the school level which in turn accounted for 
a large proportion of the gains in achieve- 



ment observed for these schools. 

III. Review of the Report’s 
Methodologies 

What the report investigates, although never 
labeled as such, is a “mediating relation- 
ship,” where school sanctions alter school 
policies, which in turn increase student 
achievement. The report uses a variety of 
appropriate statistical models to estimate the 
impact of receiving an F grade on student 
achievement and school policies. As dis- 
cussed in this review, the methodologies 
generally support the report’s goal of inves- 
tigating the relationships between policies 
and changes in student and school academic 
performance. However, given the observa- 
tional nature of the data, causal attribution is 
difficult to establish, as the authors recog- 
nize in several parts of the text yet seem- 
ingly ignore in others. 

IV. Review of the Validity of the 
Findings And Conclusions 

The report’s primary strength is its use of 
comprehensive datasets. The authors thor- 
oughly analyze the data to investigate the 
impact of accountability sanctions on both 
student achievement and school policies. 
Many of the findings of the study are, in 
fact, consistent with what is found in the 
existing literature. To address concerns that 
gains in student achievement might be solely 
a function of schools “gaming” the system, 
the authors use multiple achievement tests to 
show that student achievement gains are 
consistent and meaningful. As mentioned 
previously, consistent gains across both the 
FCAT and Stanford- 10 do not completely 
eliminate the possibility of less beneficial 
policy changes leading to increases in stu- 
dent test scores. However, the extensive 
survey data does address many of these con- 
cerns, since the researchers are able to corre- 
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late specific policy changes with the ob- 
served test-score gains. 

The most prominent shortcoming of the re- 
port is its tendency to overstate the predic- 
tive relationships indicated by their statisti- 
cal analyses. The authors have only demon- 
strated that school policies and practices 
function as a mediating variable between 
accountability sanctions and student 
achievement. The data and methods allow 
for nothing more, and the last paragraph on 
page 34 strongly suggests this mediating 
relationship: 

Across the specifications ..., the es- 
timated effect of “F” grade receipt 
decreases with the inclusion of these 
policy variables, with percentage re- 
ductions that range from very modest 
to very large. The share of the test 
score gain associated with “F” grade 
receipt is at least 15 percent with re- 
gard to reading and at least 38 per- 
cent with regard to mathematics. 
Moreover, virtually the entire ex- 
plained portion of the test score gains 
associated with an “F” grade is ap- 
parently due to the five policy do- 
mains that we found to have the 
strongest cross-sectional relationship 
with student test score gains. 

The use of “associational” language here is 
more appropriate. Yet scattered throughout 
the report, the authors write as though the 
accountability pressure causes the improve- 
ment in student achievement. For example, 
on page 22 the authors propose to “further 
identify the causal effect of receipt of an ‘F’ 
grade on student test score improve- 
ments...,” even though the regression dis- 
continuity analyses do not support such 
causal attribution. 5 Earlier, on page 14, the 
authors propose to estimate the “plausibly 
causal impact of receipt of an F grade on 



student test score” performance. More 
broadly, the report’s title suggests that 
vouchers, among all the other accountability 
provisions of the Florida A+ system, led to 
increased student achievement and that 
without these threats, the policies and prac- 
tices at the school would not have changed. 
However, there is no supporting evidence 
that this is true. Moreover, even if it is true 
that the Florida policy of vouchers plus 
other accountability provisions did lead to 
the changes in policy and practice, nothing 
in this new research allows a policy maker 
to single out either vouchers or other ac- 
countability provisions (or a combination) as 
having such an effect. By the very fact that 
different states incorporated difference in- 
centives and sanctions into their account- 
ability system, a different set of sanctions or 
incentives conceivably might lead to the 
same achievement outcomes. The title of the 
report suggests that it might be these com- 
ponents (e.g., vouchers) of accountability 
system that are responsible, yet the report 
does not provide evidence for that claim. 

Y. Usefulness of the Report 
for Guidance of Policy 
and Practice 

As the authors straightforwardly acknowl- 
edge in the final paragraph of the report, it is 
not clear that the findings of Florida will 
transfer to other locales adopting an ac- 
countability program similar to the Florida 
A+ program. The hypothetical question of 
interest is whether the A+ Plus accountabil- 
ity system, transplanted to another state, 
would yield similar results. Given the idio- 
syncrasies of states, there is no way of de- 
finitively answering such a hypothetical 
question. Perhaps a good way to think about 
the significance of these findings is that 
other states should investigate similar ques- 
tions using their own accountability systems. 
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As researchers continue to unpack the 
“black-box” of school accountability, they 
are certain to find particular school prac- 
tices, already documented in the literature, 
that produce the gains in achievement eve- 
ryone desires. The trick is to identify the 
policy levers that are most likely to give rise 
to widespread adoption of effective prac- 
tices. The new Urban Institute report sug- 
gests that one or more elements of the Flor- 
ida accountability systems may offer such a 
lever. 

However, because changes in school policy 
and practice can occur for many reasons, 



this research should not be read to show that 
the accountability system “led to” or 
“caused” the student achievement increases. 
Nor does the new report consider whether 
the accountability levers in Florida are the 
most effective means of quickly and benefi- 
cially transforming the policies and practices 
of schools in a way that leads to increased 
student achievement. The results of this 
study indicate that in Florida’s A+ Program 
and its associated sanctions led to some of 
these transformations. Whether or not it is 
the best or the only means of achieving these 
ends is the more relevant question. 
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NOTES & REFERENCES 



See Koretz, Daniel (2003). “Using Multiple Measures to Address Perverse Incentives and Score 
Inflation.” Educational Measurement: Issues and Practice 22, no. 2: 18-26). 

Linn, Robert. (2005) “Alignment, High Stakes, and the Inflation of Test Scores,” in Uses and Mis- 
uses of Data for Educational and Accountability Improvement (ed. by Joan L. Herman and 
Edward H. Haertel) (Malden, Massachusetts: Blackwell Publishing), pp. 99-118. 

2 Rouse, C. E., Hannaway, J., Goldhdhaber, D., & Figlio, D. (2007). Feeling the Florida Heat? How 

Low-Performing Schools Respond to Voucher and Accountability Pressure (Working Paper 
no. 13). Washington DC: Urban Institute’s National Center for Analysis of Longitudinal 
Data in Education Research. This Urban Institute report was published as a Working Paper, 
and the authors welcomed comments. This review will hopefully be useful to the authors in 
that regard. 

3 The response rate for the survey was an impressive 70%. 

4 Regression to the mean, called regression toward mediocrity by Galton. denotes the fact that post- 

observations always are, on average, less extreme than pre-observations. In particular, using 
standard deviation units, low-achieving students on a pre-test will on average be less ex- 
treme in standard deviation units on the post-test. 

5 Regression discontinuity designs (those that compare data from participants assigned to either the 

program or comparison groups based, in this case, on whether the school is labeled D or F) 
allow for the attribution of cause only under strict assumptions of a continuous relationship 
between assignment and outcome variables near the treatment cutoff. This condition is vio- 
lable if individuals can exert control over values on the assignment variable, which is the 
case with schools receiving a grade of F. 
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